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Abstract —  A  probability  distribution  for  mul¬ 
tilayer  perceptron  artificial  neural  net  outputs  is 
derived  assuming  a  sigmoidal  activation  function. 
This  distribution  is  known  to  be  a  member  of  the 
Johnson  system  of  distributions.  Using  this  distri¬ 
bution,  theoretical  receiver  operating  characteris¬ 
tic  curves  can  be  developed  to  obtain  recognition 
differential  values  for  corresponding  values  of  the 
probability  of  false  alarm.  Application  of  these 
techniques  for  the  detection  of  broadband  signals 
is  presented. 

I.  Introduction 

In  this  paper,  we  consider  a  feedforward  multilayer  per¬ 
ceptron  trained  with  back  propagation  of  error.  The  out¬ 
put  nodes  in  one  layer  axe  transmitted  to  nodes  in  an¬ 
other  layer  through  links  that  amplify  or  attenuate  such 
outputs  through  weighting  factors.  Except  for  the  input 
layer  nodes,  the  net  input  to  each  node  is  the  sum  of  the 
weighted  outputs  of  the  nodes  in  the  prior  layer.  Each 
node  is  activated  in  accordance  with  the  input  and  bias 
to  the  node,  and  the  activation  function  of  the  node.  The 
typical  activation  function  for  the  nodes  in  the  hidden  lay¬ 
ers  is  the  common  sigmoid  or  logistic  function, 

^°)  =  1  +  M 

This  function  is  also  known,  in  neural  net  terminol¬ 
ogy,  as  the  squashing  function.  In  (1),  the  parameter  6 
serves  as  a  threshold  or  bias  and  the  parameter  $0  mod¬ 
ifies  the  shape  of  the  sigmoid.  It  is  the  objective  of  this 
paper  to  evaluate  the  performance  of  the  neural  net  by 
modeling  the  class  conditional  probability  density  func¬ 
tions  pcjx, 1  =  0,1,  for  noise  alone  and  for  signal  plus 
noise,  respectively,  using  the  sigmoidal  squashing  func¬ 
tion.  Although  the  detection  and  false  alarm  statistics 
are  unchanged,  it  will  be  seen  that  the  bias  and  shape 


parameters  characterize  these  distributions.  Here  x  is  the 
input  feature  vector  to  be  classified,  and  the  output  class 
co,  represents  the  noise  alone  case,  while  class  ci  repre¬ 
sents  the  signal  plus  noise  case.  The  performance  metric 
presented  in  this  paper  is  based  on  the  class  conditional 
pdf’s  and  is  known  as  the  receiver  operating  characteristic 
(ROC)  curve,  which  presents  the  probability  of  detection 
Pd  =  /,°°  Pe,|x(T)dr  as  a  function  of  detection  threshold 
t,  where  t  is  chosen  to  achieve  some  prescribed  level  of 
probability  of  false  alarm,  Pfa  =  /rP^(r)dr. 

In  II,  it  will  be  shown  that  under  certain  conditions  the 
pre-sigmoided  hidden  layer  input  is  an  approximate  nor¬ 
mal  random  variable.  It  follows  that,  in  these  cases,  the 
sigmoided  output  is  a  logistic  transformation  of  an  approx¬ 
imate  normal  random  variable.  In  III,  the  distribution  of 
this  transformation  will  be  derived  and  will  be  identified 
as  a  member  of  the  Johnson  system  of  distributions.  Us¬ 
ing  this  model  identification,  ROC  curves  can  be  plotted 
as  a  function  of  detection  threshold  t  and  parameterized 
by  signal-to- noise  ratio  (SNR).  Another  metric  ot  prac¬ 
tical  utility  is  the  recognition  differential  (RD),  which  is 
the  SNR  which  guarantees  probability  of  detection  equal 
to  1/2  for  a  prescribed  level  of  false  alarm  probability;  in 
practical  applications,  this  may  in  fact  be  the  preferred 
metric.  This  point  will  be  made  in  IV  for  the  detection  of 
certain  broad-band  transient  signals  generated  by  simula¬ 
tion  in  our  laboratory. 

II.  Sums  as  Gaussian  Distributions 

We  consider  now  a  multilayer  perceptron  with  M  contin¬ 
uous  valued  inputs  x  =  (xj ,...,  iM), Xfcf(— 00,00),  and 
two  layers  of  hidden  nodes.  Taking  the  Bayesian  approach 
allows  us  to  consider  Xk  as  a  random  variable;  we  will  as¬ 
sume  that  its  mean  p*  and  variance  cl  are  finite.  Without 
loss  of  generality,  we  assume  that  pj,  =  0.  The  net  input 
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to  hidden  node  j  may  be  expressed  as 

M 

Xj^Y^VkjXk, 
k=  1 

for  real-valued  weights,  {wiy}. 

The  requirement  that  the  terms  tvkjXk  be  uniformly 
small,  which  is  known  as  the  Lindeberg  condition  [1],  is  a 
sufficient  condition  to  insure  that  the  sums  Xj,  properly 
normalized,  converge  to  the  normal  distribution.  We  have 
made  the  empirical  observation  in  the  laboratory  that,  in 
certain  cases,  the  independent  random  variables  wkjxt  do, 
in  fact,  have  a  uniformly  small  effect  on  the  sum  Xj .  Thus, 
our  contention  that,  in  these  cases,  the  sums  Xj  follow  a 
normal  distribution  is  borne  out  by  the  application  of  the 
central  limit  theorem  as  described  above. 

For  the  present  application,  M  is  sufficiently  large  (on 
the  order  of  1440)  and  the  x*  are  sufficiently  decorrelated 
to  insure  a  high  degree  of  statistical  independence  in  the 
collected  samples,  so  we  may,  in  fact,  invoke  the  central 
limit  theorem  to  assert  normality  when  the  Lindeberg  con¬ 
dition  holds.  To  insure  independence,  if  x*  is  a  portion  of 
continuous  time  series,  then  we  assume  that  xk  has  been 
sampled  at  a  rate  which  is  higher  than  the  decorrelation 
time  of  the  time  series.  We  could  also  pre-whiten  the  time 
series  by  Gram-Schmidt  orthogonalization  techniques  to 
insure  a  high  degree  of  independence. 

III.  Sigmoidal  Squashing  Function  and  the 
Johnson  Distributions 

Recall  the  squashing  function,  /(a),  defined  in  (1), 

-fw  =  < a  < 00 

Assuming  that  Xj  has  an  approximate  normal  distribu¬ 
tion  with  mean  0  and  variance  crj 2  =  as 

asserted  in  the  previous  section,  the  probability  density 
function,  pc|x(«).  of  f(Xj )  can  be  easily  derived.  In  fact 
by  the  change  of  variables  formula, 

Pc|x(«)  =  <t>  ^  +  0fllny-^)  ^,0  <  s  <  1,  (2) 

where  ^(z)  =  -00  <  2  <  00  and  z  = 

0  +  0oln  (rzjj-  After  differentiating  z  with  respect  to  s 
in  (2),  we  obtain 


Pcjx(s) 


JU _ 


e-it-ri+Viin-;}3  0  <i<it 


(3) 

where,  rjy  =  60/oj,-jj  —  6/aj.  The  density  in  (3)  is  a 
member  of  the  Johnson  system  of  distributions  and  its 


Figure  1:  Fitted  Normal  Density  Plots  for  Pre-Sigmoided 
Values  Indexed  by  Signal  Level  Offsets  (in  dB)  Including 
the  Case  of  Noise  Only. 


properties  are  well-known  [2],  In  fact,  maximum  likeli¬ 
hood  estimates  of  jj  and  rjj ,  are  given  by 

=  -Xj/Sj,r)j  =  l/sj,  (4) 

where  Xj  =  |  £"=1  Xjx  and  sj  =  ££?=I(Xy,  -Xj)\ 
where  {Xji  :  i  =  l,...,n}  rue  sampled  from  the  hidden 
layer  at  node  j. 

IV.  Receiver  Operating  Characteristic  Curves 
for  Broad-Band  Signals 

The  pre-sigmoided  values  were  collected  from  the  data 
sets  described  in  the  Appendix  for  the  five  cases  of  signal 
mixed  with  noise  at  various  levels  offset  in  1  dB  incre¬ 
ments  from  a  reference  signal  and  for  the  noise  alone  case. 
Fitted  normal  density  plots  for  these  values  are  shown  in 
Fig.  1.  In  each  of  these  cases,  the  observed  pre-sigmoided 
values  passed  the  Kolmogorov-Smirnov  (K-S)  and  the  chi- 
square  goodness  of  fit  tests  for  normality  at  the  5  %  level 
of  significance  as  we  had  asserted  in  II. 

Fig.  2  shows  the  corresponding  fitted  Johnson  density 
plots,  where  the  form  of  the  pdf  is  given  by  (3).  The  fits 
were  based  on  the  parameter  estimates,  yy  and  tj;  given  in 
(4).  The  signal-to-noise  ratio  (SNR)  in  dB  offset  is  noted 
on  each  density  curve.  The  SNR  values  given  have  been 
calculated  for  the  simulated  broad-band  transient  signals 
according  to  recent  work  described  in  f3).  The  reader  is 
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Figure  2:  Fitted  Johnson  Density  Plots  for  Sigmoidal  Out¬ 
puts  Indexed  by  Signal  Level  Offsets  (in  dB)  Including  the 
Case  of  Noise  Only. 

referred  to  that  source  for  the  technical  details  for  the 
SNR  calculations. 

Fig.  3  shows  the  fitted  normal  cumulative  distribution 
functions  overplotted  with  the  empirical  distribution  func¬ 
tion  for  the- pre-sigmoided  values.  Fig.  4  shows  the  cor¬ 
responding  semi-empirical  fit3  of  the  Johnson  cumulative 
distribution  functions  (smooth  curves)  to  the  empirical 
distribution  functions  of  the  sigmoided  outputs  (i.e.,  acti¬ 
vation  levels)  for  the  five  levels  of  signal  power  considered 
as  well  as  the  noise  alone  case.  These  semi-empirical  fits 
using  the  Johnson  distribution  were  deemed  statistically 
close  to  the  empirical  observations  as  measured  by  the  K-S 
goodness  of  fit  test  performed  at  the  5  %  level  of  signifi¬ 
cance. 

Finally  Fig.  5  gives  the  ROC  curves  for  the  various 
signal  level  offsets  based  on  the  semi-empirical  (i.e.,  fitted) 
Johnson  distributions.  These  plots  show  Pd  as  a  function 
of  Pja-  One  sees  that  for  a  prescribed  level  of  Pja  of  10~5 
and  Pd  =  the  signal  level  offset  is  very  close  to  -10  dB. 
This  value  is  known  as  the  recognition  differential  offset 
or  RD  offset  (for  the  prescribed  level  of  Pja)- 

Also  Fig.  6  below  shows  a  plot  of  the  means  and  stan¬ 
dard  deviations  of  the  five  signal  levels  considered  as  a 
function  of  the  level  offset.  Beyond  allowing  a  straightfor¬ 
ward  interpolation,  the  fated  least  squares  line  also  allows 
us  to  extrapolate  the  mean  and  standard  deviation  of  nor- 


Figure  3:  Empirical  Distribution  Functions  for  Pre- 
Sigmoided  Values  Overplotted  with  Semi-empirical  Nor¬ 
mal  Cumulative  Distribution  Functions. 


Figure  4:  Empirical  Distribution  Functions  for  the  Sig¬ 
moidal  Outputs  Overplotted  with  Semi-empirical  Johnson 
Cumulative  Distribution  Functions. 
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Figure  5:  Semi-empirical  Receiver  Operating  Characteris¬ 
tic  Curves  Using  Johnson  Distributions  for  the  Sigmoided 
Outputs  Indexed  by  Signal  Level  Offsets  (in  dB). 


Figure  6:  (a)  Means  and  (b)  Standard  Deviations  of  Five 
Cases  with  Linear  Fits. 


simulated  signal  levels.  The  equations  for  the  mean  and 
standard  deviation  fits  are  respectively, 

y=  1.3722  +  0.17381*,  (5) 

y  =  0.22241  -  0.004246*.  (6) 

Now  to  obtain  the  RD  given  any  prescribed  level  of  Pja , 
we  observe  that  the  detection  threshold,  tj(a),  is  given  by 

td(a)  =  -  a)  +  m0,  (7) 

where  m„  =  —1.21474  and  s„  =  0.196038  are  the  mean 
and  standard  deviation, respectively,  of  the  noise,  and  $-1 
is  the  inverse  of  the  unit  normal  cumulative  distribution 
function.  Using  (5),  it  is  easy  to  see  that  the  RD  at  level 
a  must  satisfy 

RD=tA*).-m\t 

m2 

where  tti\  =  1.3722  and  m2  =  0.17381.  Together  with  (7), 
we  have  finally, 

+  (8) 

m2 


Figure  7:  Recognition  Differential  as  a  Function  of  P;a 


Finally,  Fig.  7  plots  RD  as  function  of  Pfa  given  by  (8). 


V  Conclusions 

For  a  prescribed  level  of  Pja  of  10“3,  the  R!)  may  be 
extrapolated  by  noting  a  functional  relationship  of  the 
means  and  variances  of  the  normal  distributions  of  the 
pre-sigmoided  values  as  a  function  of  the  signal  level  off¬ 
set  (cf.  Fig  1).  Extrapolating  this  value  gives  an  RD 
of  approximately  -11.4  dB  (cf.  Fig.  7)  offset  from  the 
reference  signal. 

As  mentioned  in  II,  the  invocation  of  the  central  limit 
theorem  for  establishing  the  normality  of  the  weighted 
sums  of  sigmoided  neuron  outputs  is  only  applicable  when 
the  Lindeberg  condition  holds.  Our  most  recent  work  [4] 
demonstrates  that  we  can,  in  fact,  fit  the  Johnson  sys¬ 
tem  of  distributions  to  empirical  distributions  of  various 
shapes  and  that,  like  the  results  described  in  this  paper, 
these  generalized  fits  can  also  describe  the  location  and 
shape  of  the  distributions  in  terms  of  the  input  SNR  lev¬ 
els. 
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A  strong  SNR  broadband  transient  event  was  digitized 
from  a  recorded  data  tape.  A  30  second  interval,  which 
contained  the  signal  with  a  trailer  of  noise,  was  then  re¬ 
peatedly  played  to  a  tape  recorder  for  two  hours.  There 
were  no  gaps  between  the  beginning  and  ending  noise  sam 
pies  of  the  captured  interval.  Edge  effects  were  an  initial 
concern  but  none  have  been  observed.  The  recorded  tape 
of  repeated  events  was  designated  as  the  signal  master 
tape.  Using  an  analog  signal  attenuator,  bandpass  filtered 
output  from  the  signal  master  was  then  recorded  to  a  se¬ 
ries  of  tapes  with  the  analog  attenuator  offset  one  dB  from 
the  reference  signal,  per  recorded  tape.  The  initial  tape, 
which  was  consequently  recorded  at  the  strongest  signal 
level,  was  designated  as  the  reference  signal  level.  Each 
two  hour  tape  stored  240  repetitions  of  the  same  event. 

Finally,  all  the  signal  tapes  were  analog  mixed  with  a 
two  hour  period  of  ambient  ocean  noise.  For  a  given  time 
on  the  noise  tape,  the  time  of  occurrence  of  the  events 
varied  up  to  within  a  few  seconds.  None  of  the  events 
were  mixed  with  the  noise  to  within  the  same  sampling 
interval.  By  not  playing  a  signal  tape,  a  noise  only  tape 
was  recorded.  For  all  the  test  recordings,  the  two-hour 
noise  interval  was  simply  repeated  for  each  signal  tape 
with  all  other  system  parameters  fixed. 

Each  tape  was  played  into  a  realtime  classification  sys¬ 
tem  which  utilizes  in  -  house  developed  artificial  neural  net¬ 
works.  For  noise  only,  the  output  activation  levels  were 
blocked  into  contiguous  signal  length  durations  and  the 
maximum  activation  level  was  recorded  for  each  block. 
Those  values  were  used  to  calculate  the  empirical  dis¬ 
tribution  of  noise  only  For  each  signal  tape,  the  max¬ 
imum  activation  plus  or  minus  the  signal  duration  was 
then  recorded.  Those  values  were  used  to  calculate  the 
signal  present  distributions. 


